Goto

Collaborating Authors

 optimal control policy


Catching heuristics are optimal control policies

Neural Information Processing Systems

Two seemingly contradictory theories attempt to explain how humans move to intercept an airborne ball. One theory posits that humans predict the ball trajectory to optimally plan future actions; the other claims that, instead of performing such complicated computations, humans employ heuristics to reactively choose appropriate actions based on immediate visual feedback. In this paper, we show that interception strategies appearing to be heuristics can be understood as computational solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances. The catcher's policy switches between generating reactive and predictive behavior based on the ratio of system to observation noise and the ratio between reaction time and task duration. Thus, we provide a rational account of human ball-catching behavior and a unifying explanation for seemingly contradictory theories of target interception on the basis of stochastic optimal control.


Catching heuristics are optimal control policies

Boris Belousov, Gerhard Neumann, Constantin A. Rothkopf, Jan R. Peters

Neural Information Processing Systems

Such internal models allow for planning and potentially optimal action generation, e.g., they enable optimal catching strategies where humans predict the interception point and move there as fast as mechanically possible to await the ball. Clearly, there exist situations where latencies of the catching task require such strategies (e.g., when


Advancing Frontiers of Path Integral Theory for Stochastic Optimal Control

Patil, Apurva

arXiv.org Artificial Intelligence

Stochastic Optimal Control (SOC) problems arise in systems influenced by uncertainty, such as autonomous robots or financial models. Traditional methods like dynamic programming are often intractable for high-dimensional, nonlinear systems due to the curse of dimensionality. This dissertation explores the path integral control framework as a scalable, sampling-based alternative. By reformulating SOC problems as expectations over stochastic trajectories, it enables efficient policy synthesis via Monte Carlo sampling and supports real-time implementation through GPU parallelization. We apply this framework to six classes of SOC problems: Chance-Constrained SOC, Stochastic Differential Games, Deceptive Control, Task Hierarchical Control, Risk Mitigation of Stealthy Attacks, and Discrete-Time LQR. A sample complexity analysis for the discrete-time case is also provided. These contributions establish a foundation for simulator-driven autonomy in complex, uncertain environments.


Catching heuristics are optimal control policies

Neural Information Processing Systems

Two seemingly contradictory theories attempt to explain how humans move to intercept an airborne ball. One theory posits that humans predict the ball trajectory to optimally plan future actions; the other claims that, instead of performing such complicated computations, humans employ heuristics to reactively choose appropriate actions based on immediate visual feedback. In this paper, we show that interception strategies appearing to be heuristics can be understood as computational solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances.


Tsallis Entropy Regularization for Linearly Solvable MDP and Linear Quadratic Regulator

Hashizume, Yota, Oishi, Koshi, Kashima, Kenji

arXiv.org Artificial Intelligence

Shannon entropy regularization is widely adopted in optimal control due to its ability to promote exploration and enhance robustness, e.g., maximum entropy reinforcement learning known as Soft Actor-Critic. In this paper, Tsallis entropy, which is a one-parameter extension of Shannon entropy, is used for the regularization of linearly solvable MDP and linear quadratic regulators. We derive the solution for these problems and demonstrate its usefulness in balancing between exploration and sparsity of the obtained control law.


A Physics-informed Deep Learning Approach for Minimum Effort Stochastic Control of Colloidal Self-Assembly

Nodozi, Iman, O'Leary, Jared, Mesbah, Ali, Halder, Abhishek

arXiv.org Artificial Intelligence

We propose formulating the finite-horizon stochastic optimal control problem for colloidal self-assembly in the space of probability density functions (PDFs) of the underlying state variables (namely, order parameters). The control objective is formulated in terms of steering the state PDFs from a prescribed initial probability measure towards a prescribed terminal probability measure with minimum control effort. For specificity, we use a univariate stochastic state model from the literature. Both the analysis and the computational steps for control synthesis as developed in this paper generalize for multivariate stochastic state dynamics given by generic nonlinear in state and non-affine in control models. We derive the conditions of optimality for the associated optimal control problem. This derivation yields a system of three coupled partial differential equations together with the boundary conditions at the initial and terminal times. The resulting system is a generalized instance of the so-called Schr\"{o}dinger bridge problem. We then determine the optimal control policy by training a physics-informed deep neural network, where the "physics" are the derived conditions of optimality. The performance of the proposed solution is demonstrated via numerical simulations on a benchmark colloidal self-assembly problem.


Using Deep Reinforcement Learning for Zero Defect Smart Forging

Ma, Yunpeng, Kassler, Andreas, Ahmed, Bestoun S., Krakhmalev, Pavel, Thore, Andreas, Toyser, Arash, Lindback, Hans

arXiv.org Artificial Intelligence

Defects during production may lead to material waste, which is a significant challenge for many companies as it reduces revenue and negatively impacts sustainability and the environment. An essential reason for material waste is a low degree of automation, especially in industries that currently have a low degree of digitalization, such as steel forging. Those industries typically rely on heavy and old machinery such as large induction ovens that are mostly controlled manually or using well-known recipes created by experts. However, standard recipes may fail when unforeseen events happen, such as an unplanned stop in production, which may lead to overheating and thus material degradation during the forging process. In this paper, we develop a digital twin-based optimization strategy for the heating process for a forging line to automate the development of an optimal control policy that adjusts the power for the heating coils in an induction oven based on temperature data observed from pyrometers. We design a digital twin-based deep reinforcement learning (DTRL) framework and train two different deep reinforcement learning (DRL) models for the heating phase using a digital twin of the forging line. The twin is based on a simulator that contains a heating transfer and movement model, which is used as an environment for the DRL training. Our evaluation shows that both models significantly reduce the temperature unevenness and can help to automate the traditional heating process.


A Reinforcement Learning Approach to Health Aware Control Strategy

Jha, Mayank Shekhar, Weber, Philippe, Theilliol, Didier, Ponsart, Jean-Christophe, Maquin, Didier

arXiv.org Artificial Intelligence

Health-aware control (HAC) has emerged as one of the domains where control synthesis is sought based upon the failure prognostics of system/component or the Remaining Useful Life (RUL) predictions of critical components. The fact that mathematical dynamic (transition) models of RUL are rarely available, makes it difficult for RUL information to be incorporated into the control paradigm. A novel framework for health aware control is presented in this paper where reinforcement learning based approach is used to learn an optimal control policy in face of component degradation by integrating global system transition data (generated by an analytical model that mimics the real system) and RUL predictions. The RUL predictions generated at each step, is tracked to a desired value of RUL. The latter is integrated within a cost function which is maximized to learn the optimal control. The proposed method is studied using simulation of a DC motor and shaft wear.


A reinforcement learning approach to hybrid control design

Gandhi, Meet, Kundu, Atreyee, Bhatnagar, Shalabh

arXiv.org Artificial Intelligence

In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov Decision Process (MDP). This result facilitates the application of off-the-shelf algorithms from Reinforcement Learning (RL) literature towards designing optimal control policies. Second, we model a set of benchmark examples of hybrid control design problem in the proposed MDP framework. Third, we adapt the recently proposed Proximal Policy Optimisation (PPO) algorithm for the hybrid action space and apply it to the above set of problems. It is observed that in each case the algorithm converges and finds the optimal policy.


Model-free optimal control of discrete-time systems with additive and multiplicative noises

Lai, Jing, Xiong, Junlin, Shu, Zhan

arXiv.org Machine Learning

This paper investigates the optimal control problem for a class of discrete-time stochastic systems subject to additive and multiplicative noises. A stochastic Lyapunov equation and a stochastic algebra Riccati equation are established for the existence of the optimal admissible control policy. A model-free reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs without requiring any knowledge of the system matrices. It is proven that the learning algorithm converges to the optimal admissible control policy. The implementation of the model-free algorithm is based on batch least squares and numerical average. The proposed algorithm is illustrated through a numerical example, which shows our algorithm outperforms other policy iteration algorithms.